Search CORE

116 research outputs found

Advances in All-Neural Speech Recognition

Author: Droppo J.
Stolcke A.
Yu C.
Zweig G.
Publication venue
Publication date: 25/01/2017
Field of study

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly used NIST 2000 conversational telephony test set, and significantly exceed the previously published performance of similar systems, both with and without the use of an external language model and decoding technology

arXiv.org e-Print Archive

Crossref

The Microsoft 2017 Conversational Speech Recognition System

Author: Alleva F.
Droppo J.
Huang X.
Stolcke A.
Wu L.
Xiong W.
Publication venue
Publication date: 24/08/2017
Field of study

We describe the 2017 version of Microsoft's conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set

arXiv.org e-Print Archive

Crossref

The Microsoft 2016 Conversational Speech Recognition System

Author: Droppo J.
Huang X.
Seide F.
Seltzer M.
Stolcke A.
Xiong W.
Yu D.
Zweig G.
Publication venue
Publication date: 25/01/2017
Field of study

We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task

arXiv.org e-Print Archive

Crossref

Adversarial Reweighting for Speaker Verification Fairness

Author: Chen Zeya
Droppo Jasha
Jin Minho
Ju Chelsea J. -T.
Liu Yi-Chieh
Stolcke Andreas
Publication venue
Publication date: 15/07/2022
Field of study

We address performance fairness for speaker verification using the adversarial reweighting (ARW) method. ARW is reformulated for speaker verification with metric learning, and shown to improve results across different subgroups of gender and nationality, without requiring annotation of subgroups in the training data. An adversarial network learns a weight for each training sample in the batch so that the main learner is forced to focus on poorly performing instances. Using a min-max optimization algorithm, this method improves overall speaker verification fairness. We present three different ARWformulations: accumulated pairwise similarity, pseudo-labeling, and pairwise weighting, and measure their performance in terms of equal error rate (EER) on the VoxCeleb corpus. Results show that the pairwise weighting method can achieve 1.08% overall EER, 1.25% for male and 0.67% for female speakers, with relative EER reductions of 7.7%, 10.1% and 3.0%, respectively. For nationality subgroups, the proposed algorithm showed 1.04% EER for US speakers, 0.76% for UK speakers, and 1.22% for all others. The absolute EER gap between gender groups was reduced from 0.70% to 0.58%, while the standard deviation over nationality groups decreased from 0.21 to 0.19

arXiv.org e-Print Archive

Development of novel 2D and 3D correlative microscopy to characterise the composition and multiscale structure of suspended sediment aggregates.

Author: Agrawal
Andrew J. Bushby
Arganda-Carreras
Azam
Burd
Burger
Burnett
Bushby
Bushby
Caplan
Cnudde
Dazzo
Droppo
Droppo
Handschuh
Heissenberger
Holzer
Ian G. Droppo
Jarvis
Jonathan A.T. Wheatland
Kate L. Spencer
Ketcham
Khelifa
Lee
Leppard
Leppard
Liss
Liss
Maggi
Manning
Nguyen
Nguyen
Ollion
O’Shea
Peachey
Preibisch
Righetti
Rummel
Rusconi
Schindelin
Sharma
Simon J. Carr
Soulsby
Tolhurst
Ward
Wheatland
Winterwerp
Zhang
Publication venue: 'Elsevier BV'
Publication date: 17/04/2020
Field of study

Suspended cohesive sediments form aggregates or 'flocs' and are often closely associated with carbo, nutrients, pathogens and pollutants, which makes understanding their composition, transport and fate highly desirable. Accurate prediction of floc behaviour requires the quantification of 3-dimensional (3D) properties (size, shoe and internal structure) that span several scales (i.e. nanometre [nm] to millimetre [mm]-scale). Traditional techniques (optical cameras and electron microscopy [EM]), however, can only provide 2-dimensional (2D) simplifications of 3D floc geometries. Additionally, the existence of a resolution gap between conventional optical microscopy (COM) and transmission EM (TEM) prevents an understanding of how floc nm-scale constituents and internal structure influence mm-scale floc properties. Here, we develop a novel correlative imaging workflow combining 3D X-ray micro-computed tomography (μCT), 3D focused ion beam nanotomography (FIB-nt) and 2D scanning EM (SEM) and TEM (STEM) which allows us to stabilise, visualise and quantify the composition and multi scale structure of sediment flocs for the first time. This new technique allowed the quantification of 3D floc geometries, the identification of individual floc components (e.g., clays, non-clay minerals and bacteria), and characterisation of particle-particle and structural associations across scales. This novel dataset demonstrates the truly complex structure of natural flocs at multiple scales. The integration of multiscale, state-of-the-art instrumentation/techniques offers the potential to generate fundamental new understanding of floc composition, structure and behaviour

Crossref

Queen Mary Research Online

Insight - University of Cumbria

Decision tree-based acoustic models for speech recognition

Author: J Ajmera
J Ajmera
J Ajmera
J Droppo
Jitendra Ajmera
JT Foote
L Breiman
Masami Akamine
OR Duda
PC Woodland
R Teunen
S Young
V Tyagi
X Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

THE MICROSOFT 2016 CONVERSATIONAL SPEECH RECOGNITION SYSTEM

Author: A Stolcke
D Yu
F Seide
G Zweig
J Droppo
M Seltzer
W Xiong
X Huang
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task

CiteSeerX

Hydrodynamic coupling in microbially mediated fracture mineralization : formation of self-organized groundwater flow channels

Author: Abelin
Andre
Bang
Beveridge
Birkholzer
Bourke
Bradford
Brown
Brown
Budai
Chen
Cheong
Cunningham
Cuthbert
Cuthbert
DeJong
DeNovio
Derjaguin
Detwiler
Droppo
Droppo
Durham
Durham
Erica MacLachlan
Ferris
Ferris
Ferris
Fridrich
Fujita
Gadd
Gargiulo
Gil'man
Glass
Gollapudi
Gráinne El Mountassir
Hammes
Harkes
Heath
Heather Moir
Heim
Higgins
Hilgers
Hogg
Holmqvist
Inoue
Johnson
Juniper
Kirby
Konhauser
Leopold
Li
Metcalfe
Mitchell
Mitchell
Moreno
Muynck
Neretnieks
Nollet
Nordqvist
Paassen
Parks
Pedersen
Pedersen
Pedersen
Phillips
Rebecca J. Lunn
Rijn
Rodriguez-Blanco
Rubert
Schryver
Schultz
Schultze-Lam
Segall
Stocks-Fischer
Stoner
Tittelboom
Tobler
Tobler
Trewin
Tsang
Tufenkji
Verwey
Wan
Whiffin
Yao
Zimmermann
Publication venue: 'Wiley'
Publication date: 25/02/2014
Field of study

Evidence of fossilized microorganisms embedded within mineral veins and mineral-filled fractures has been observed in a wide range of geological environments. Microorganisms can act as sites for mineral nucleation and also contribute to mineral precipitation by inducing local geochemical changes. In this study, we explore fundamental controls on microbially induced mineralization in rock fractures. Specifically, we systematically investigate the influence of hydrodynamics (velocity, flow rate, aperture) on microbially mediated calcite precipitation. Our experimental results demonstrate that a feedback mechanism exists between the gradual reduction in fracture aperture due to precipitation, and its effect on the local fluid velocity. This feedback results in mineral fill distributions that focus flow into a small number of self-organizing channels that remain open, ultimately controlling the final aperture profile that governs flow within the fracture. This hydrodynamic coupling can explain field observations of discrete groundwater flow channeling within fracture-fill mineral geometries where strong evidence of microbial activity is reported

Crossref

University of Strathclyde Institutional Repository

A neighborhood statistics model for predicting stream pathogen indicator levels

Author: CR Rehmann
G Wilkes
Gregory B. Pasternack
IG Droppo
J Besag
JJ Rothwell
JJ Rothwell
JW Kim
JW Nagels
KE Schilling
KH Cho
Mahbubul Majumder
Mark S. Kaiser
Michelle L. Soupir
MR Hipsey
MS Kaiser
N Cressie
PB Parajuli
PK Pandey
PK Pandey
Pramod K. Pandey
RC Jamieson
RW Muirhead
S Bai
SM Dorner
YA Pachepsky
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Using Avrami equation in the studies on changes in granulometric composition of algal suspension

Author: A Bluma
A Khalil
A Kozak
A Krumme
A Lyche-Solheim
A Quirantes
A Quirantes
AHJ Cloot
AT Lorenzo
D Martin
D Ovono Ovono
D Richter
D Richter
E Burszta-Adamiak
G Bushell
H Hoz Siegler De la
H Mazur-Marzec
IG Droppo
J Málek
J Vázquez
J Wollschläger
J Wu
J Xiao
J Yao
JA Martins
Janusz Łomotowski
JJ García-Mesa
JQ Mao
KH Cho
KJ Flynn
L Liu
L Sitoki
M Bizi
M Sperazza
Magdalena Kuśnierz
MBG Souza
MG Lu
ML Herrera
ML Lorenzo Di
MN Kaggwa
MP Stoyneva
MT Dokulil
MT Todinov
P Supaphol
PW Lehman
RA Judge
S Fiore
S Morin
S Rolinski
T Berge
T Martinez
T Stelzer
TH Tran
TH Tran
VA Fernandes
X-Y Li
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref